XPhoneBERT is the first multilingual phoneme representation pretraining model for text-to-speech (TTS), based on the BERT-base architecture and trained with 330 million phoneme-level sentences across nearly 100 languages.
Speech Synthesis
Transformers